DQM: Decentralized Quadratically Approximated 
Alternating Direction Method of Multipliers 

Aryan Mokhtari, Wei Shi, Qing Ling, and Alejandro Ribeiro 


Abstract —This paper considers decentralized consensus optimiza¬ 
tion problems where nodes of a network have access to different 
summands of a global objective function. Nodes cooperate to mini¬ 
mize the global objective by exchanging information with neighbors 
only. A decentralized version of the alternating directions method 
I of multipliers (DADMM) is a common method for solving this 
category of problems. DADMM exhibits linear convergence rate to 
the optimal objective but its Implementation requires solving a convex 
optimization problem at each iteration. This can be computationally 
Wjbostly and may result in large overall convergence times. The 
3 decentralized quadratically approximated ADMM algorithm (DQM), 
which minimizes a quadratic approximation of the objective function 
that DADMM minimizes at each iteration, is proposed here. The 
On consequent reduction in computational time is shown to have minimal 

_effect on convergence properties. Convergence stlU proceeds at a lin- 

^ 'ear rate with a guaranteed constant that is asymptotically equivalent 
>««✓ to the DADMM linear convergence rate constant. Numerical results 
demonstrate advantages of DQM relative to DADMM and other 
• alternatives in a logistic regression problem. 

Index Terms —Multi-agent network, decentralized optimization, 
^ Alternating Direction Method of MultipUers. 
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I. Introduction 

Decentralized algorithms are used to solve optimization prob¬ 
lems where components of the objective are available at different 
nodes of a network. Nodes access their local cost functions only 
but try to minimize the aggregate cost by exchanging information 
with their neighbors. Specifically, consider a variable x S and 
a connected network containing n nodes each of which has access 
to a local cost function K. The nodes’ goal is to find 

the optimal argument of the global cost function Y!h=i /*(^)’ 


X* = argmin^/,(x). 
^ i=i 


( 1 ) 
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Problems of this form arise in, e.g., decentralized control Q- 
0, wireless communication 0, 0, sensor networks 0-0, and 
large scale machine learning |Tg-|Tg. In this paper we assume 
that the local costs fi are twice differentiable and strongly convex. 

There are different algorithms to solve Q in a decentralized 
manner which can be divided into two major categories. The ones 
that operate in the primal domain and the ones that operate in 
the dual domain. Among primal domain algorithms, decentralized 
(sub)gradient descent (DGD) methods are well studied |T3)-|T5). 
They can be interpreted as either a mix of local gradient descent 
steps with successive averaging or as a penalized version of 0 
with a penalty term that encourages agreement between adjacent 
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nodes. This latter interpretation has been exploited to develop the 
network Newton (NN) methods that attempt to approximate the 
Newton step of this penalized objective in a distributed manner 
im, ini- 'Tie methods that operate in the dual domain consider 
a constraint that enforces equality between nodes’ variables. They 
then ascend on the dual function to find optimal Lagrange multi¬ 
pliers with the solution of 0 obtained as a byproduct 0, fT^- 
0. Among dual descent methods, decentralized implementation 
of the alternating directions method of multipliers (ADMM), 
known as DADMM, is proven to be very efficient with respect to 
convergence time 0,10, ing. 

A fundamental distinction between primal methods such as 
DGD and NN and dual domain methods such as DADMM is 
that the former compute local gradients and Hessians at each 
iteration while the latter minimize local pieces of the Lagrangian 
at each step - this is necessary since the gradient of the dual 
function is determined by Lagrangian minimizers. Thus, iterations 
in dual domain methods are, in general, more costly because they 
require solution of a convex optimization problem. However, dual 
methods also converge in a smaller number of iterations because 
they compute approximations to x* instead of descending towards 
X*. Having complementary advantages, the choice between primal 
and dual methods depends on the relative cost of computation and 
communication for specific problems and platforms. Alternatively, 
one can think of developing methods that combine the advantages 
of ascending in the dual domain without requiring solution of an 
optimization problem at each iteration. This can be accomplished 
by the decentralized linearized ADMM (DLM) algorithm 171 ), 
| [22| , which replaces the minimization of a convex objective 
required by ADMM with the minimization of a first order linear 
approximation of the objective function. This yields per-iteration 
problems that can be solved with a computational cost akin to 
the computation of a gradient and a method with convergence 
properties closer to DADMM than DGD. 

If a first order approximation of the objective is useful, a 
second order approximation should decrease convergence times 
further. The decentralized quadratically approximated ADMM 
(DQM) algorithm that we propose here minimizes a quadratic 
approximation of the Lagrangian minimization of each ADMM 
step. This quadratic approximation requires computation of local 
Hessians but results in an algorithm with convergence properties 
that are; (i) better than the convergence properties of DLM; (ii) 
asymptotically identical to the convergence behavior of DADMM. 
The technical contribution of this paper is to prove that (i) and 
(ii) are true from both analytical and practical perspectives. 

We begin the paper by discussing solution of 0 with DADMM 
and its linearized version DLM (Section E- Both of these 
algorithms perform updates on dual and primal auxiliary variables 
that are identical and computationally simple. They differ in 
the manner in which principal primary variables are updated. 
DADMM solves a convex optimization problem and DLM solves 
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a regularized linear approximation. We follow with an explanation 
of DQM that differs from DADMM and DLM in that it minimizes 
a quadratic approximation of the convex problem that DADMM 
solves exactly and DLM approximates linearly (Section llli. We 
also explain how DQM can be implemented in a distributed 
manner (Proposition and Algorithm [TJ. Convergence properties 
of DQM are then analyzed (Section |IV]l where linear convergence 
is established (Theoremand Corollary [^l- Key in the analysis is 
the error incurred when approximating the exact minimization of 
DADMM with the quadratic approximation of DQM. This error 
is shown to decrease as iterations progress (Proposition faster 
than the rate that the error of DLM approaches zero (Proposi¬ 
tion 1^. This results in DQM having a guaranteed convergence 
constant strictly smaller than the DLM constant that approaches 
the guaranteed constant of DADMM for large iteration index 
(Section IV-A| l. We corroborate analytical results with numerical 
evaluations in a logistic regression problem (Section |V|. We show 
that DQM does outperform DLM and show that convergence paths 
of DQM and DADMM are almost identical (Section [V-A| i. Overall 
computational cost of DQM is shown to be smaller, as expected. 


Notation. Vectors are written as x G K” and matrices as A G 
Given n vectors x^, the vector x = [xi;...; x„] represents 
a stacking of the elements of each individual x^. We use ||x|| to 
denote the Euclidean norm of vector x and ||A|| to denote the 
Euclidean norm of matrix A. The gradient of a function / at point 
X is denoted as V/(x) and the Hessian is denoted as V^/(x). 
We use cr(B) to denote the singular values of matrix B and A(A) 
to denote the eigenvalues of matrix A. 


11. Distributed Alternating Directions Method of 
Multipliers 

Consider a connected network with n nodes and m edges where 
the set of nodes is V = {1,..., n} and the set of ordered edges 
S contains pairs (z,j) indicating that i can communicate to j. 
We restrict attention to symmetric networks in which {i,j) G S 
if and only if (j, i) G E and define node i’s neighborhood as 
the set Afi = {j | {i,j) G £}. In problem Q agent i has 
access to the local objective function /i(x) and agents cooperate 
to minimize the global cost This specification is 

more naturally formulated by defining variables x^ representing 
the local copies of the variable x. We also define the auxiliary 
variables associated with edge {i,j) G E and rewrite 0 as 

n 

:= argmin ^ /*(xQ, (2) 

X T 

1 — 1 

s. t. Xi = Zij, 'x.j = Zy , for all (z, j) G E. 

The constraints x^ = z^ and x^ = z^j enforce that the variable x^ 
of each node i is equal to the variables x^ of its neighbors j G Ni- 
This condition in association with network connectivity implies 
that a set of variables {xi,... ,x„} is feasible for problem 0 
if and only if all the variables x^ are equal to each other, i.e., if 
xi = • • • = x„. Therefore, problems 0 and 0 are equivalent in 
the sense that for all z and j the optimal arguments of 0 satisfy 
X* = X* and Zy = x*, where x* is the optimal argument of 0. 

To write problem 0 in a matrix form, define Ag G 
as the block source matrix which contains m x n square blocks 
{As)e,i G The block (Ag)e,i is not identically null if 

and only if the edge e corresponds to e = (z,j) G E in 
which case (As)e,i = Ip- Likewise, the block destination matrix 


Ad G contains mxn square blocks {Ad)e,i G Rp^^. The 

square block {Ad)e,i = Ip when e corresponds to e = (j, z) G E 
and is null otherwise. Eurther define x := [xi;...;x„] G 
as a vector concatenating all local variables x^, the vector 
z := [zi;...;zm] G concatenating all auxiliary variables 

Ze = Zij, and the aggregate function / ; —>■ M as 

/(x) := X]r=i We can then rewrite 0 as 

X* := argmin/(x), s. t. A^x — z = 0, A^/x — z = 0. (3) 

X 

Define now the matrix A = [As;A£;] G M^mpxnp yyjjich 
stacks the source and destination matrices, and the matrix B = 
[—Imp;—Imp] G ]^ 2 mpxmp yyjjjcjj stacks two negative identity 
matrices of size mp to rewrite 0 as 

X* := argmin/(x), s. t. Ax + Bz = 0. (4) 

X 

DADMM is the application of ADMM to solve 0. To develop 
this algorithm introduce Lagrange multipliers a.e = cxij and 
(3e = /3y associated with the constraints x^ = Zy- and x^ = Zy in 
0, respectively. Define a. := [ai;...; a^j as the concatenation 
of the multipliers a.e which yields the multiplier of the constraint 
AgX — z = 0 in 0. Likewise, the corresponding Lagrange 
multiplier of the constraint A^x—z = 0 in 0 can be obtained by 
stacking the multipliers /3g to define (3 := ...; (3^. Grouping 

a and (3 into A [a; /3] G leads to the Lagrange multiplier 
A associated with the constraint Ax + Bz = 0 in 0. Using these 
definitions and introducing a positive constant c > 0 we write the 
augmented Lagrangian of 0 as 

£(x, z, A) := /(x) + (Ax + Bz) + | || Ax + Bz||^ . (5) 

The idea of ADMM is to minimize the Lagrangian £(x, z,A) 
with respect to x, follow by minimizing the updated Lagrangian 
with respect to z, and finish each iteration with an update of the 
multiplier A using dual ascent. To be more precise, consider the 
time index fc G N and define x^, z^, and A^ as the iterates at 
step k. At this step, the augmented Lagrangian is minimized with 
respect to x to obtain the iterate 

Xfc+i = argmin/(x)+A^ (Ax + Bzfc) + ^ HAx + Bz^f . (6) 

Then, the augmented Lagrangian is minimized with respect to the 
auxiliary variable z using the updated variable x^+i to obtain 

Zfc+i = argmin /(x^+Q (7) 

Z 

+ (Ax/j+i + Bz) + - II Ax/j_|_i + Bz|j . 

After updating the variables x and z, the Lagrange multiplier A^ 
is updated through the dual ascent iteration 

Afe_|_i = Afc + c (Axfe_|_i + Bzfc+i). (8) 

The DADMM algorithm is obtained by observing that the struc¬ 
ture of the matrices A and B is such that 0-0 can be 
implemented in a distributed manner 0, |T8), |Tg. 

The updates for the auxiliary variable z and the Lagrange 
multiplier A are not costly in terms of computation time. However, 
updating the primal variable x can be expensive as it entails the 
solution of an optimization problem [cf. 0]. The DLM algorithm 
avoids this cost with an inexact update of the primal variable 
iterate x^+i. This inexact update relies on approximating the 
aggregate function value /(x^+i) in 0 through a regularized 
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linearization of the aggregate function / in a neighborhood of 
the curi'ent variable x^. This regularized approximation takes the 
form /(x) « /(xfc) + V/(xfe)^(x-Xfc) + (p/2)||x-Xfc||2 for a 
given positive constant p > 0. Consequently, the update formula 
for the primal variable x in DLM replaces the DADMM exact 
minimization in (|^ by the minimization of the quadratic form 

Xfc+i = argmin /(x^) + V/(xfc)^(x - x^) + ^||x-Xfe|p 

+ Xl (Ax + Bzfe) + ^ llAx + Bzfcf . (9) 

The first order optimality condition for (|^ implies that the updated 
variable x^+i satisfies 


V/(xfe) + p{xk+i - Xfc) + A'^Afe + cA^ (Axfc+i + Bz^) = 0. 

( 10 ) 

According to ( [T0| ), the updated variable x^+i can be computed 
by inverting the positive definite matrix pi + cA^A. This update 
can also be implemented in a distributed manner. 


The sequence of variables x^ generated by DLM converges 
linearly to the optimal argument X* 10. Although this is the 
same rate of DADMM, linear convergence constant of DLM is 
smaller than the one for DADMM (see Section |IV-A| l, and can 
be much smaller depending on the condition number of the local 
functions fi (see Section [V-A l. To close the gap between these 
constants we can use a second order approximation of (|^. This 
is the idea of DQM that we introduce in the following section. 


III. DQM; Decentralized Quadratically 
Approximated ADMM 

DQM uses a local quadratic approximation of the primal func¬ 
tion /(x) around the current iterate x^. If we let := V2/(xfc) 
denote the primal function Hessian evaluated at x^ the quadratic 
approximation of / at x^ is /(x) « /(x^) + V/(xfc)'^(x-Xfe) + 
(l/2)(x—x/j)^Hfe(x—Xfc). Using this approximation in (|^ yields 
the DQM update that we therefore define as 

Xfc+i := argmin /(x^) + V/(xfc)^(x - x^) (11) 

X 

+ ^(x-Xfc)^Hfc(x-Xfc) 

+ Afc (Ax -f Bzfe) + I ||Ax + Bzfcll^ . 

Comparison of (|^ and ( [TT] l shows that in DLM the quadratic term 
(p/2)||xfe_|_i — Xfclp is added to the first-order approximation of 
the primal objective function, while in DQM the second order 
approximation of the primal objective function is used to reach a 
more accurate approximation for /(x). Since 0 is a quadratic 
program, the first order optimality condition yields a system of 
linear equations that can be solved to find Xfc+i, 

V/(xfc) -f Hfe(xfe+i-xfc) -f A^Afe -f cA^(Axfc+i -f Bz^) = 0. 

( 12 ) 

This update can be solved by inverting the matrix -f cA^A 
which is invertible if, as we are assuming, /(x) is strongly convex. 

The DADMM updates in 0 and 0 are used verbatim in 
DQM, which is therefore defined by recursive application of 
0, 0, and ([8|. It is customary to consider the first order 
optimality conditions of 0 and to reorder terms in ([^ to rewrite 


the respective updates as 

B^Afe + cB^ (Axfc+i + Bzfe_|_i) = 0, 

Afe+i - Afc - c(Axfc+i-f Bzfe+i) = 0. (13) 

DQM is then equivalently defined by recursive solution of the 
system of linear equations in 0 and 0- This system, as is the 
case of DADMM and DLM, can be reworked into a simpler form 
that reduces communication cost. To derive this simpler form we 
assume a specific structure for the initial vectors Aq = [ao; /3o]’ 
Xq, and Zq as introduced in the following assumption. 

Assumption 1 Define the oriented incidence matrix as := 
As — Ad and the unoriented incidence matrix as E„ := Ag + 
Ad. The initial Lagrange multipliers ckq and and the initial 
variables xq and Zg are chosen such that: 

(a) The multipliers are opposites of each other, ccg = —(3q. 

(b) The initial primal variables satisfy E„Xo = 2zo. 

(c) The initial multiplier ccg lies in the column space o/Eq. 

Assumption[T]is minimally restrictive. The only non-elementary 
condition is (c) but that can be satisfied by ag = 0. Nulling all 
other variables, i.e., making /3 q = 0, Xg = 0, and zg = 0 is a 
trivial choice to comply with conditions (a) and (b) as well. An 
important consequence of the initialization choice in 0 is that if 
the conditions in Assumption are true at time k = 0 they stay 
true for all subsequent iterations fc > 0 as we state next. 

Lemma 1 Consider the DQM algorithm as defined by 0-0. 
If Assumption^holds, then for all k > 0 the Lagrange multipliers 
a^. and f3f., and the variables xj, and z^ satisfy: 

(a) The multipliers are opposites of each other, cx^ = —(3).. 

(b) The primal variables satisfy E„Xfc = 2zfc. 

(c) The multiplier ol]^ lies in the column space o/Eq. 

Proof: See Appendix [A| ■ 

The validity of (c) in Lemma is important for the convergence 
analysis of Section The validity of (a) and (b) means that 
maintaining multipliers and (3f. is redundant because they 
are opposites and that maintaining variables z^ is also redundant 
because they can be computed as z^ = EuXfe/2. It is then possible 
to replace ([T2ll-(|T3]l by a simpler system of linear equations as we 
explain in the following proposition. 

Proposition 1 Consider the DQM algorithm as defined by 0- 
0 and define the sequence cj)^. := E^a^. Further define 
the unoriented Laplacian as := (l/2)EjEtj, the oriented 
Laplacian as Lq = (1/2)EJEo, and the degree matrix as 
D := (L„-|-Lo)/2. If Assumption^holds true, the DQM iterates 
Xfe can be generated as 

Xfc+i = (2cD -f [(cL„ -f Hfc)xfe - V/(xfe) - 0^], 

(f>k+i = 4>k + cLoXk+i. (14) 

Proof: See Appendix [B] ■ 

Proposition [T] states that by introducing the sequence of vari¬ 
ables (fif., the DQM primal iterates Xk can be computed through 
the recursive expressions in ( [T4l i. These recursions are simpler 
than 0-0 because they eliminate the auxiliary variables z^ 
and reduce the dimensionality of A^ - twice the number of edges 
- to that of - the number of nodes. Further observe that if 
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(HI is used for implementation we don’t have to make sure that 
the conditions of Assumptionare satisfied. We just need to pick 
</>Q := Eq ckq for some olq in the column space of Eg - which is 
not difficult, we can use, e.g., (/)g = 0. The role of Assumption 
[^is to state conditions for which the expressions in ([T^-([T3|) are 
an equivalent representation of ( [l4| ) that we use for convergence 
analyses. 

The structure of the primal objective function Hessian H^, the 
degree matrix D, and the oriented and unoriented Laplacians Lq 
and Lu make distributed implementation of ( [T4l l possible. Indeed, 
the matrix 2cD + Hfc is block diagonal and its i-th diagonal block 
is given by 2cdil + V^/i(xi) which is locally available for node 
i. Likewise, the inverse matrix (2cD + is block diagonal 

and locally computable since the i-th diagonal block is (2cdil + 
V^/i(xi))“^. Computations of the products L^x^ and LoX^+i 
can be implemented in a decentralized manner as well, since the 
Laplacian matrices L„ and Lq are block neighbor sparse in the 
sense that the (i,j)-th block is not null if and only if nodes i 
and j are neighbors or j = i. Therefore, nodes can compute their 
local parts for the products Lu^k and LoXfc+i by exchanging 
information with their neighbors. By defining components of the 
vector cf)j. as := [</>i j,,..., the update formula in ( [l4| ) 

for the individual agents can then be written block-wise as 


— (^2cdiT -f V -f C ^ ^ ^j,k 

jeAfi 


(15) 


where x^ ^ corresponds to the iterate of node i at step k. 
Notice that the defintion := (1/2)EJE„ = (l/2)(As + 
Ad)’^{As + Ad) is used to simplify the i-th component of 
cL„Xfc as {^i,k +^j,k) which is equivalent to cdiXi^k + 

cJ2j^j\f ^j,k- Further, using the definition Lq = (1/2)EJEo = 
(l/2)(As — Ad)'^{As — Ad), the i-th component of the product 
cLoXfc+i in ( |T6 ] i can be simplified as (x^ fc — Xj ^). 

Therefore, the second update formula in ( |l4l i can be locally 
implemented at each node i as 


4>i,k+l = 4>i,k + C X! “ ^J,k+l) ■ (16) 

jeMi 

The proposed DQM method is summarized in Algorithm [T] The 
initial value for the local iterate x^ g can be any arbitrary vector 
in The initial vector cp^ g should be in column space of E^ . 
To guarantee satisfaction of this condition, the initial vector is 
set as </)j g = 0. At each iteration k, updates of the primal and 
dual variables in ([BJ and ( [T6] | are computed in Steps 2 and 4, 
respectively. Nodes exchange their local variables x^ ^ with their 
neighbors j G Mi in Step 3, since this information is required for 
the updates in Steps 2 and 4. 

DADMM, DQM, and DLM occupy different points in a tradeoff 
curve of computational cost per iteration and number of iterations 
needed to achieve convergence. The computational cost of each 
DADMM iteration is large in general because it requires solution 
of the optimization problem in ([^. The cost of DLM iterations 
is minimal because the solution of can be reduced to the 
inversion of a block diagonal matrix; see p2) . The cost of 
DQM iterations is larger than the cost of DLM iterations because 
they require evaluation of local Hessians as well as inversion 
of the matrices 2cdil + V^/i(xj fc) to implement ( [T5| ). But the 
cost is smaller than the cost of DADMM iterations except in 


Algorithm 1 DQM method at node i 

Require: Initial local iterates Xi,o and (p^. 
1: for fc = 0 , 1 , 2 , ... do 
2: Update the local iterate Xi_fe+i as 


Xi,fc+1 = (2cd*H-V^/i(xi,fe)) ^ 

cdiX.^k + c ^ x,-fe 


jSA/i 

+ vVi(xi,fe)xi 

,fe “ V fi{xi^k) — (pi^k 


3: Exchange iterates Xi,fc+i with neighbors j G Mi. 

4: Update local dual variable as 

d’i.k+l ~ 4‘i,k "f ^ y ] (Xgfc+1 — Xj_fc+i) . 
jeAfi 

5: end for 


cases in which solving (|^ is easy. In terms of the number of 
iterations required until convergence, DADMM requires the least 
and DLM the most. The foremost technical conclusions of the 
convergence analysis presented in the following section are: (i) 
convergence of DQM is strictly faster than convergence of DLM; 
(ii) asymptotically in the number of iterations, the per iteration 
improvements of DADMM and DQM are identical. It follows 
from these observations that DQM achieves target optimality in a 
number of iterations similar to DADMM but with iterations that 
are computationally cheaper. 

IV. Convergence Analysis 

In this section we show that the sequence of iterates x^ 
generated by DQM converges linearly to the optimal argument 
X* = [x*;...;x*]. As a byproduct of this analysis we also 
obtain a comparison between the linear convergence constants of 
DLM, DQM, and DADMM. To derive these results we make the 
following assumptions. 

Assumption 2 The network is such that any singular value of 
the unoriented incidence matrix E„, defined as (t(E„), satisfies 
0 < 7„ < cr(E„) < r„ where 7 „ and r„ are constants; the 
smallest non-zero singular value of the oriented incidence matrix 
Eo is 7 o > 0. 

Assumption 3 The local objective functions /i(x) are twice 
differentiable and the eigenvalues of their local Hessians V^/i(x) 
are bounded within positive constants m and M where 0 < m < 
M < oo so that for all x G it holds 

ml ^ VVi(x) ^ Ml. (17) 

Assumption 4 The local Hessians V^/i(x) are Lipschitz contin¬ 
uous with constant L so that for all x, x S it holds 

||V^/i(x) - V^/j(x)|| < L||x-x||. (18) 

The eigenvalue bounds in Assumption are measures of 
network connectivity. Note that the assumption that all the sin¬ 
gular values of the unoriented incidence matrix E^ are positive 
implies that the graph is non-bipartite. The conditions imposed 
by assumptions and are typical in the analysis of second 
order methods; see, e.g., | |23] Chapter 9]. The lower bound for 
the eigenvalues of the local Hessians V^/i(x) implies strong 
convexity of the local objective functions /i(x) with constant 
m, while the upper bound M for the eigenvalues of the local 
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Hessians V^/i(x) is tantamount to Lipschitz continuity of local 
gradients V/i(x) with Lipschitz constant M. Further note that as 
per the definition of the aggregate objective /(x) := 
the Hessian H(x) := V^/(x) e ^npxnp block diagonal with 
i-th diagonal block given by the z-th local objective function 
Hessian V^/i(xi). Therefore, the bounds for the local Hessians’ 
eigenvalues in •EH also hold for the aggregate function Hessian. 
Thus, we have that for any x G the eigenvalues of the Hessian 
H(x) are uniformly bounded as 

ml ^ H(x) ^ MI. (19) 

Assumption [ 4 ] also implies an analogous condition for the aggre¬ 
gate function Hessian H(x) as we show in the following lemma. 


Lemma 2 Consider the definition of the aggregate function 
/(x) := M^i)- If Assumption 0 holds true, the aggregate 

function Hessian H(x) =: V^/(x) is Lipschitz continuous with 
constant L. I.e., for all x, x € we can write 

||H(x)-H(x)|| <L||x-x|l. (20) 

Proof: See Appendix ■ 

DQM can be interpreted as an attempt to approximate the pri¬ 
mal update of DADMM. Therefore, we evaluate the performance 
of DQM by studying a measure of the error of the approximation 
in the DQM update relative to the DADMM update. In the primal 
update of DQM, the gradient V/(xfe+i) is estimated by the 
approximation V/(xfc) + Hfc(xfc+i — x^). Therefore, we can 
define the DQM error vector e^, ^ as 

e^QM _ ^ Hfe(xfe+i - Xfc) - V/(xfc+i). (21) 

Based on the definition in ( |2T] ), the approximation error of DQM 
vanishes when the difference of two consecutive iterates x^+i—x^ 
approaches zero. This observation is formalized in the following 
proposition introducing an upper bound for the error vector 
norm || in terms of the difference norm ||xfc_|_i — x^jj. 


Proposition 2 Consider the DQM method as introduced in @- 
([T3J and the error defined in ( |21| l. If Assumptions^^^hold 

true, the DQM error norm is bounded above by 


DQM 


< min<j 2M||xfc+i - Xfc||, -||xfe+i - Xfc|p 


( 22 ) 


Proof: See Appendix [D| ■ 

Proposition 1^ asserts that the error norm is bounded 

above by the minimum of a linear and a quadratic term of the 
iterate difference norm ||xfc_|_i — Xfc||. Hence, the approximation 
error vanishes as the sequence of iterates x^ converges. We will 
show in Theorem [T| that the sequence Ijx^+i — x^jj converges to 
zero which implies that the error vector converges to the 

null vector 0. Notice that after a number of iterations the term 
(L/2)||xfc+i —Xfcll becomes smaller than 2M, which implies that 
the upper bound in ( |22] l can be simplified as (L/2)||xfc+i — x^lp 
for sufficiently large k. This is important because it implies that 
the error vector norm eventually becomes proportional 

to the quadratic term ||xfc_|_i — Xj^jp and, as a consequence, it 
vanishes faster than the term ||xfc+i — X;;|j. 

Utilize now the definition in ( |2T| l to rewrite the primal variable 
DQM update in ( |T2l i as 


V/(xfe+i) -I- + A^Afc -I- cA^(Axk+i+Bzk) = 0. (23) 


Comparison of ( |2^ with the optimality condition for the 
DADMM update in shows that they coincide except for 
the gradient approximation error term . The DQM and 

DADMM updates for the auxiliary variables and the dual 
variables \k are identical [cf 0, (HI, and ([^], as already 
observed. 

Further let the pair (x*, z*) stand for the unique solution of 
a with uniqueness implied by the strong convexity assumption 
and define a* as the unique optimal multiplier that lies in the 
column space of Eq - see Lemma 1 of ED for a proof that such 
optimal dual variable exists and is unique. To study convergence 
properties of DQM we modify the system of DQM equations 
defined by ( [T3| and ( |2^ , which is equivalent to the system ( [T2| 
-([T3|, to include terms that involve differences between current 
iterates and the optimal arguments x*, z*, and a*. We state this 
reformulation in the following lemma. 

Lemma 3 Consider the DQM method as defined by and 

its equivalent formulation in and ( |23| l. If Assumption^holds 
true, then the optimal arguments x*, z*, and a* satisfy 

V/(xfc+i) - V/(x*) + ef <3“ + Ej(a,.+i - a*) 

-cEJ (zfe - Zfc+i) = 0, (24) 
2{ak+i - Oik) - cEo(xfc+i - X*) = 0, (25) 

E„(xfc-x*)-2(zfc-z*) =0. (26) 

Proof: See Appendix |E] ■ 

With the preliminary results in Lemmata |D and and Proposi¬ 
tion we can state our convergence results. To do so, define the 
energy function V : —>• K as 

V{z,a.) := c||z - z*f -f -||q; - a*|p. (27) 

c 

The energy function U(z,q:) captures the distances of the vari¬ 
ables Zfc and Oik to the respective optimal arguments z* and a.*. 
To simplify notation we further define the variable u G and 

matrix C G K 2 mpx 2 mp 


z 

C. — 

dmp 0 

OL 


0 {l/c)lmp _ 


Based on the definitions in ( |28| , the energy function in ( |27| can be 
alternatively written U(z,q:) = U(u) = ||u —u*||q, where u* = 
[z*;a*]. The energy sequence U(ufe) = ||ufc — u*||q converges 
to zero at a linear rate as we state in the following theorem. 


Theorem 1 Consider the DQM method as defined by ([TD-([T3|, 
let the constant c be such that c > 4M^/(to7^), and define the 
sequence of non-negative variables C,k os 


Cfc := min <1 - Xfc||,2M 


(29) 


Further, consider arbitrary constants p, p', and p with p, p' > 1 
and pk G C7«/Cfe)- If Assumptions hold true, then the 

sequence ||ufc — u *||0 generated by DQM satisfies 

1 „ 


lUfc+i-U*||^ < 


Ufe - u 


\h 


1 + 

where the sequence of positive scalars Sk is given by 

{p - - pkfk)ll rn-Ck/Pk 


(30) 


Sk = min ■ 


l^l^'icTlyl + Kllc{p' - 1)) ’ cr2/4 -f pM’^jcil 


(31) 
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Proof: See Appendix]^ ■ 

Notice that Sk is a decreasing function of and that is 
bounded above by 2M. Therefore, if we substitute (k by 2M in 
iB’ the inequality in ( [30l l is still valid. This substitution implies 
that the sequence ||ufc — u*||q converges linearly to zero with 
a coefficient not larger than 1 — (5 with 5 — 5k following from 
(|30ll with C,k = 2M. The more generic definition of (k in PB is 


important for the rate comparisons in Section IV-A Observe that 
in order to guarantee that > 0 for all k > 0, rjk is chosen from 
the interval (Cfc/w, C7^/Cfc). This interval is non-empty since the 
constant c is chosen as c > 4M^/(m7^) > Ck/i'^'lu)- 

The linear convergence in Theorem[T]is for the vector which 
includes the auxiliary variable Zk and the multipliers ctfe. Linear 
convergence of the primal variables to the optimal argument 
X* follows as a corollary that we establish next. 


Corollary 1 Under the assumptions in Theorem the sequence 
of squared norms ||xfc — x*|p generated by the DQM algorithm 
converges R-linearly to zero, i.e., 

||xfe - x*p < ^||ufc - u*||c. (32) 

cil 

Proof: Notice that according to ( |26| ) we can write |jE„(xfc — 
x*)|p = 4||zfc — z*|p. Since 7 ^ is the smallest singular value of 
Eu, we obtain that jjxfc — x*|p < (4/7^)||zfc — z*|p. Moreover, 
according to the relation ||ufc —u*||q = c||zfc —z*|p-|-(l/c)||afe — 
a*IP we can write c\\zk — z*\f < ||ufc — u*||^. Combining these 
two inequalities yields the claim in ( |3^ . ■ 

As per Corollary [T] convergence of the sequence x^ to x* 
is dominated by a linearly decreasing sequence. Notice that the 
sequence of squared norms ||xfc —x*|p need not be monotonically 
decreasing as the energy sequence |lui;_|_i — u*||^ is. 


Proposition 3 Consider the DLM algorithm with updates in Q- 

m. 

30 


a and the error vector defined in P4[ ). If Assumptions 


hold true, the DLM error vector norm e 


Ml 


satisfies 


,DLM\ 


< (p + M)||xfc+i -Xfel 


(35) 


Proof: See Appendix [D| ■ 

The result in Proposition [^differs from Proposition]^ in that the 
DLM error ||eP^'^|| vanishes at a rate of Ijx^+i—x^H whereas the 

>QM|, 


DQM error |jet 


eventually becomes proportional to Ijx^+i — 


Xfelp. This results in DLM failing to approach the convergence 
behavior of DADMM as we show in the following theorem. 


Theorem 2 Consider the DLM method as introduced in 0- 
0. Assume that the constant c is chosen such that c > 
{pMY'/ {myf). Moreover, consider p,pl > 1 as arbitrary 
constants and p as a positive constant chosen from the interval 
{{p + M)/m,C'y^/{p + M)). If Assumptions 1 4 hold true, then 


the sequence ||ufe — u*||^ generated by DLM satisfies 


!ufc+i-u*||^ < 


1 + 5 


!ufc - u*llc , 


(36) 


where the scalar S is given by 


S — i 


f (m- l)(c7i-Vk(p+M))y^ m- {p+M)/pk \ 
{crljl+4:{p+MY/c{p'-1))' cTl/i+pM^c-fli 

(37) 


Proof: See Appendix]^ ■ 

Based on the result in Theorem]^ the sequence Ijufe+i — u*||q 
generated by DLM converges linearly to 0. This result is similar 
to the convergence properties of DQM as shown in Theorem 
however, the constant of linear convergence 1/(1 + 5) in is 
smaller than the constant 1/(1 + 5k) in ([3^. 


A. Convergence rates comparison 

Based on the result in Corollary the sequence of iterates x^ 
generated by DQM converges. This observation implies that the 
sequence ||xfc+i — Xfc|| approaches zero. Hence, the sequence of 
scalars defined in ( |29] l converges to 0 as time passes, since 
C,k is bounded above by (L/2)||x/j+i — Xf;||. Using this fact that 
limfc_>.oo C/c = 0 to compute the limit of 5k in pTj ) and further 
making p,' —1 in the resulting limit we have that 


lim 5k 

k—foo 




m 


crj/4 + pM'^/cyl 


(33) 


Notice that the limit of 5k in ( [3^ is identical to the constant 
of linear convergence for DADMM p9) . Therefore, we conclude 
that as time passes the constant of linear convergence for DQM 
approaches the one for DADMM. 

To compare the convergence rates of DLM, DQM and DADMM 
we define the error of the gradient approximation for DLM as 


ef = V/(xfe) + p(xfc+i - Xfc) - V/(xfc+i), (34) 


which is the difference of exact gradient V/(xfe+i) and the DLM 
gradient approximation Wf{xk) + p{x.k+i — x^,). Similar to the 
result in Proposition!^ for DQM we can show that the DLM error 
vector norm is bounded by a factor of ||xfc+i — Xfc||. 


V. Numerical analysis 


In this section we compare the performances of DLM, DQM 
and DADMM in solving a logistic regression problem. Consider 
a training set with points whose classes are known and the goal 
is finding the classifier that minimizes the loss function. Let 
q be the number of training points available at each node of 
the network. Therefore, the total number of training points is 
nq. The training set {s^;, at node i contains q pairs of 

where is a feature vector and yu G {—1,1} is 
the corresponding class. The goal is to estimate the probability 
P (y = 1 I s) of having label y = 1 for a given feature vector 
s whose class is not known. Logistic regression models this 
probability as P (y = 1 | s) = 1/(1 + exp(—s^x)) for a linear 
classifier x that is computed based on the training samples. It 
follows from this model that the maximum log-Iikelihood estimate 
of the classifier x given the training samples {{siz, is 


:= argmin 


^^log 1 + exp(-y,iS^x) 


i=l 1=1 


(38) 


The optimization problem in ( |38l l can be written in the form ([T]i. 
To do so, simply define the local objective functions fi as 



1=1 


(39) 
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Fig. 1: Relative error |jxfe - x*||/||xo - x*|| of DADMM, DQM, 
and DLM versus number of iterations for a random network of 
size n = 10. The convergence path of DQM is similar to the one 
for DADMM and they outperform DLM by orders of magnitude. 


Fig. 3; Relative error ||xfc — x*||/||xo — x*|| of DADMM, DQM, 
and DLM versus number of iterations for a random network of 
size n = 100. The performances of DQM and DADMM are still 
similar. DLM is impractical in this setting. 



Fig. 2; Relative error |jxfe — x*||/||xo — x*|| of DADMM, DQM, 
and DLM versus runtime for the setting in Fig. The computa¬ 
tional cost of DQM is lower than DADMM and DLM. 


Fig. 4; Relative eiTor ||xfc - x*||/||xo - x*|| of DADMM, DQM, 
and DLM versus runtime for the setting in Fig. The convergence 
time of DADMM is slightly faster relative to DLM, while DQM 
is the most efficient method among these three algorithms. 


We dehne the optimal argument for decentralized optimization as 
X* = [x*;... ;x*]. Note that the reference (ground true) logistic 
classihers x* for all the experiments in this section are pre¬ 
computed with a centralized method. 

A. Comparison of DLM, DQM, and DADMM 

We compare the convergence paths of the DLM, DQM, and 
DADMM algorithms for solving the logistic regression problem 
in ( |38] l. Edges between the nodes are randomly generated with 
the connectivity ratio Tc- Observe that the connectivity ratio is 
the probability of two nodes being connected. 

In the hrst experiment we set the number of nodes as n = 10 
and the connectivity ratio as Tc = 0.4. Each agent holds q = 5 
samples and the dimension of feature vectors is p = 3. Eig.[2illus- 
trates the relative errors ||xfe — x*|j/||xo — x*|| for DLM, DQM, 
and DADMM versus the number of iterations. Notice that the 
parameter c for the three methods is optimized by cadmm =0.7, 
cdlm = 5.5, and cdqm = 0.7. The convergence path of DQM is 
almost identical to the convergence path of DADMM. Moreover, 
DQM outperforms DLM by orders of magnitude. To be more 
precise, the relative errors ||xfc — x*||/||xo — x*|| for DQM and 
DADMM after k = 300 iterations are below 10“®, while for DLM 
the relative error after the same number of iterations is 5 x 10“^. 
Conversely, achieving accuracy |jxfc — x*||/||xo — x*|| = 10“^ 
for DQM and DADMM requires 91 iterations, while DLM 
requires 758 iterations to reach the same accuracy. Hence, the 


number of iterations that DLM requires to achieve a specihc 
accuracy is 8 times more than the one for DQM. 

Observe that the computational complexity of DQM is lower 
than DADMM. Therefore, DQM outperforms DADMM in terms 
of convergence time or number of required operations until 
convergence. This phenomenon is shown in Eig|^by comparing 
the relative of errors of DLM, DQM, and DADMM versus CPU 
runtime. According to Eig|^ DADMM achieves the relative error 
||xfc — x*|j/||xo — x*|| = 10“^° after running for 3.6 seconds, 
while DLM and DQM require 1.3 and 0.4 seconds, respectively, 
to achieve the same accuracy. 

We also compare the performances of DLM, DQM, and 
DADMM in a larger scale logistic regression problem by setting 
size of network n = 100, number of sample points at each node 
q = 20, and dimension of feature vectors p = 10. We keep the rest 
of the parameters as in Eig. [T] Convergence paths of the relative 
errors ||xfc — x*||/||xo — x*|| for DLM, DQM, and DADMM 
versus the number of iterations are illustrated in Eig. Different 
choices of parameter c are considered for these algorithms and 
the best for each is chosen for the hnal comparison. The optimal 
choices of parameter c for DADMM, DLM, and DQM are 
Cadmm = 0.68, cdlm = 12.3, and cdqm = 0.68, respectively. 
The results for the large scale problem in Eig. are similar to 
the results in Eig. We observe that DQM performs as well 
as DADMM, while both outperform DLM. To be more precise. 
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Fig. 5: Relative error ||xfc — x*|j/||xo — x*|| of DQM for param¬ 
eters c = 0.2, c = 0.4, c = 0.8, and c = 1 when the network is 
formed by n = 10 nodes and the connectivity ratio is = 0.4. 
The best performance belongs to c = 0.8. 



Fig. 6: Relative error jjx^ — x*|j/||xo — x*|| of DQM for random 
graphs with different connectivity ratios Vc- The linear conver¬ 
gence of DQM accelerates by increasing the connectivity ratio. 


DQM and DADMM after k = 900 iterations reach the relative 
error ||xfe — x*|j/||xo — x*|| = 3.4x 10“^, while the relative error 
of DLM after the same number of iterations is 2.9 x 10“^. Con¬ 
versely, achieving the accuracy ||xfc — x*|]/||xo — x*|| = 0.3 for 
DQM and DADMM requires 52 iterations, while DLM requires 
870 iterations to reach the same accuracy. Flence, in this setting 
the number of iterations that DLM requires to achieve a specific 
accuracy is 16 times more than the one for DQM. These numbers 
show that the advantages of DQM relative to DLM are more 
significant in large scale problems. 

Notice that in large scale logistic regression problems we 
expect larger condition number for the objective function /. In 
these scenarios we expect to observe a poor performance by the 
DLM algorithm that only operates on first-order information. This 
expectation is satisfied by comparing the relative errors of DLM, 
DQM, and DADMM versus runtime for the large scale problem 
in Fig. 1^ In this case, DLM is even worse than DADMM that has 
a very high computational complexity. Similar to the result in Fig. 
1^ DQM has the best performance among these three methods. 

B. Effect of the regularization parameter c 

The parameter c has a significant role in the convergence of 
DADMM. Likewise, choosing the optimal choice of c is critical 
in the convergence of DQM. We study the effect of c by tuning 
this parameter for a fixed network and training set. We use all the 
parameters in Fig. [T] and we compare performance of the DQM 
algorithm for the values c = 0.2, c = 0.4, c = 0.8, and c = 1. 
Fig. [^illustrates the convergence paths of the DQM algorithm for 
different choices of the parameter c. The best performance among 
these choices is achieved for c = 0.8. The comparison of the plots 
in Fig. [^ shows that increasing or decreasing the parameter c is 
not necessarily leads to a faster convergence. We can interpret c 
as the stepsize of DQM which the optimal choice may vary for 
the problems with different network sizes, network topologies, 
condition numbers of objective functions, etc. 

C. Effect of network topology 

According to ( |3T| ) the constant of linear convergence for DQM 
depends on the bounds for the singular values of the oriented 
and unoriented incidence matrices and E^. These bounds are 
related to the connectivity ratio of network. We study how the 


network topology affects the convergence speed of DQM. We 
use different values for the connectivity ratio to generate random 
graphs with different number of edges. In this experiment we 
use the connectivity ratios Tc = {0.2,0.3, 0.4,0.6} to generate 
the networks. The rest of the parameters are the same as the 
parameters in Fig. Notice that since the connectivity parameters 
of these graphs are different, the optimal choices of c for these 
graphs are different. The convergence paths of DQM with the 
connectivity ratios Tc = (0.2, 0.3,0.4, 0.6} are shown in Fig. [^ 
The optimal choices of the parameter c for these graphs are Co .2 = 
0.28, Co .3 = 0.25, Co .4 = 0.31, and cq.o = 0.28, respectively. 
Fig. [^ shows that the linear convergence of DQM accelerates by 
increasing the connectivity ratio of the graph. 

VI. Conclusions 

A decentralized quadratically approximated version of the al¬ 
ternating direction method of multipliers (DQM) is proposed for 
solving decentralized optimization problems where components of 
the objective function are available at different nodes of a network. 
DQM minimizes a quadratic approximation of the convex problem 
that DADMM solves exactly at each step, and hence reduces 
the computational complexity of DADMM. Under some mild 
assumptions, linear convergence of the sequence generated by 
DQM is proven. Moreover, the constant of linear convergence 
for DQM approaches that of DADMM asymptotically. Numerical 
results for a logistic regression problem verify the analytical 
results that convergence paths of DQM and DADMM are similar 
for large iteration index, while the computational complexity of 
DQM is significantly smaller than DADMM. 

Appendix A 
Proof of Lemma[T] 

According to the update for the Lagrange multiplier A in ( [T3] l, 
we can substitute by A^+i — c(Axfc+i -f Bz^+i). Applying 
this substitution into the first equation of ( fO] ) leads to 

B^Afc+i = 0. (40) 

Observing the definitions B = [—Imp', —Imp] and A = [a; /3], 
and the result in we obtain cxk+i = —for k > 0. 
Considering the initial condition we obtain that = 

-(3). for k > 0 which follows the first claim in Lemma 
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Based on the definitions A = [As;Ad], B = [—Imp;—Imp], 
and A = [a.] (3], we can split the update for the Lagrange 
multiplier A in ([^ as 

afc+i = afe + c[AsXfe+i - Zfc+i], (41) 

^fc+i = /3fc + c[AdXfc+i - Zfe+i]. (42) 

Observing the result that OLk = for k >0, summing up the 
equations in (|4^ and ( |42l l yields 

(As + Ad)xfc+i = 2zfc+i. (43) 

Considering the definition of the oriented incidence matrix E„ = 
Ag + Ad, we obtain that E„Xfc = 2zfc holds for fc > 0. According 
to the initial condition E„Xo = 2zo, we can conclude that the 
relation E„Xfc = 2zfc holds for fc > 0. 

Subtract the update for (3j. in ( |42l l from the update for a.^ in 
iD and consider the relation f3^. = —cxk to obtain 

afc+i = cifc +-(As - Ad)xfc+i. (44) 

Substituting As — A.d in ( |44l i by Eq implies that 

c 

cife+i = ttfc +-EoXfc+1. (45) 

Hence, if a.k lies in the column space of matrix Eq, then OLk+i 
also lies in the column space of Eq. According to the third 
condition of Assumption [T] ckq satisfies this condition, therefore 
OLk lies in the column space of matrix Eq for all k > 0. 


Appendix B 

Proof of Proposition[T] 

The update for the multiplier A in (|^ implies that we can 
substitute A^ by A^+i — c(Ax/j_|_i + Bz^+i) to simplify as 

V/(xfc) + Hfc(xfe+i -Xfc)+A^Afc+i +cA^B (z^ - Zk+i) = 0. 

(46) 

Considering the first result of Lemma [T] that OLk = —Pk for A: > 0 
in association with the definition A = [As; A^;] implies that the 
product A^Afc_|_i is equivalent to 

A'^Afe+i = AjoLk+i + AdPk+i = (As - Ad)'^Q:fc+i. (47) 

According to the definition Eq := Ag — Ad, the right hand side 
of ( |47| i can be simplified as 

A^Ai;_|_i = 'Ei'^oik+i- (48) 

Based on the structures of the matrices A and B, and the 
definition E„ := Ag + Ad, we can simplify A^B as 

A^B = -Af - Aj = -Ej. (49) 

Substituting the results in ( |48] l and ( |49l l into ( |46] l leads to 

V/(xfe) + Hfc(xfe+i - Xfc) + Ejcife+i + cEJ (zfe+i - Zk) = 0. 

(50) 

The second result in Lemma states that Zk = E„Xfc/2. 
Multiplying both sides of this equality by Ej from left we obtain 
that Ejzfe = EjE„Xfe/2 for k > 0. Observing the definition 
of the unoriented Laplacian := EJEii/ 2, we obtain that the 
product 'E'^Zk is equal to Lu^k for fc > 0. Therefore, in ( fSO] ) we 
can substitute E^ (zfc+i — z^) by L„(xfc_|_i — x^) and write 

V/(xfc) + (Hfc + cL„) (xfc+i - Xfc) + Eq a/c+i = 0. (51) 


Observe that the new variables are defined as 4>k '■= 
Multiplying both sides of ( |45] l by Ej from the left hand side and 
considering the definition of oriented Laplacian = EqEo/ 2 
follows the update rule of 4>k in ( fl4| ), i.e., 

(kk+i — 4>k 3- cLoXfc_|_i. (52) 

According to the definition 4>k — E^otk and the update formula 
in ( |52l l, we can conclude that E^ = 4>k+i — 4>k3-cLoXk+i- 
Substituting E'^otk+i by (pj. + cLoXk+i in ( |5T] i yields 

V/(xfc) + (Hfc + cL„) (xfc+i - Xfc) + ^fc + cLoXfc+i = 0. (53) 

Observing the definition D = (L^ + Lo)/2 we rewrite ( |53l ) as 

(Hfc + 2cD) Xfc+i = (Hfc + cL„) Xfc - V/(xfc) - cpk- (54) 

Multiplying both sides of ( |54l i by (Hfc + 2cD)~^ from the left 
hand side yields the first update in (|T4li. 


Appendix C 
Proof of Lemma|2] 


Consider two arbitrary vectors x [xi;...;x„] G 
and X := [xi;...;x„] £ Since the aggregate function 

Hessian is block diagonal where the i-th diagonal block is 
given by V^fi(xi), we obtain that the difference of Hessians 
H( x) —H(x) is also block diagonal where the Lth diagonal block 
H(x)ii - H(x)^. is 

H(x),, - H(x),, = V^MxP - (55) 


Consider any vector v £ and separate each p components 
of vector v and consider it as a new vector called £ K^, 
i.e., V := [vi;...; v„]. Observing the relation for the difference 
H(x) - H(x) in ( |55] l, the symmetry of matrices H(x) and 
H(x), and the definition of Euclidean norm of a matrix that 
||A|| = we obtain that the squared difference 

norm ||H(x) - H( x)||^ can be written as 


■rT/-'Mi 2 v^[H(x) — H(x)[^v 

|H(x) - H(x)|r= max — ^ ^ ^ — 


(56) 


= max 


Eti vf - V^fijxi)]' 


Using the Cauchy-Schwarz inequality we can write 

v[[V^fz(xi)-V^fi(xP]‘^v, < ||vVi(x*)-V^/i(xi)||^||v,f 

(57) 

Substituting the upper bound in ( |57] i into ( |56] l implies that the 
squared norm ||H(x) — H(x)|l^ is bounded above as 


Ei=i ||vV*(xO - V2/i(xi)| 


||H(x) — H(x)|| < max 

(58) 

Observe that Assumption 3 states that local objective functions 
Hessian V^/i(xi) are Lipschitz continuous with constant L, i.e. 
||V2/j(xi)-V^/i(xj)|| < L||xi-Xi||. Considering this inequality 
the upper bound in ( |58| ) can be changed by replacing || V^/i(xi) — 
V^/i(xi)|| by L||xj -Xijj which yields 


|H(x)-H(x)f <max- 


E TL 

2=1 


Note that for any sequences of scalars such as ai and 6^, the 
inequality EEi ^ iEEi '^'i)iEEi holds. If we divide 
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both sides of this relation by X]”=i = ||xi —Xi|| and 

bi — ||vi||, we obtain 


Er=iiix,-*.f iiv.f 

12 


Er=i 


i=l 


Combining the two inequalities in ([59l) and (|60|) leads to 


||H(x) - H(x)|l^ < maxL^ V ||xj - x^f . 

V ^ 


(60) 


( 61 ) 


Since the right hand side of ( |M] l does not depend on v we 
can eliminate the maximization with respect to v. Further, note 
that according to the structure of vectors x and x, we can 
write ||x —x||^ = 11^* These two observations in 

association with ([6^ imply that 

||H(x)-H(x)f <L2||x-xf , (62) 


Computing the square roots of terms in ([62l) yields (|20li. 


Appendix D 

Proofs of Propositions |2]and[3] 


The fundamental theorem of calculus implies that the difference 
of gradients V/(xf;+i) — V/(xfc) can be written as 


V/(xfc+i)-V/(xfc) 


/ H(sxfc+i + (l-s)xfe)(xfe+i-xfc) ds. 
Jo 

(63) 


By computing norms of both sides of ( |6^ and considering that 
norm of integral is smaller than integral of norm we obtain that 

||V/(xfc+i)-V/(xfe)|| < /'||H(sXfe+i + (l-s)xfc)(xfc+i-Xfc)||(is. 


(64) 

The upper bound M for the eigenvalues of the Hessians as in 
(HU, implies that |jH (sx + (1 — s)x) (x — x)|| < M||x — x||. 
Substituting this upper bound into (|64li leads to 


||V/(xfc+i) - V/(xfe)|| < M||xfc+i - Xfc||. (65) 


the right hand side of (|6^ results in 


V/(xfc+i) - V/(xfc) = / H(xfc)(xfc+i - Xfc) ds 
do 

+ / [H(sxfc+i + (1 - s)xfe) - H(xfe)] (xfc+i - Xfc) ds. (69) 
Jo 


First observe that the integral H(xfe)(xfc+i — x^) ds can be 
simplified as H(xfe)(xfc_|_i — Xj,). Observing this simplification 
and regrouping the terms yield 


V/(xfc+i) - V/(xfc) - H(xfc)(xfc+i - Xfe) = 


[H(sxfc+i + (1 


s)xfc) - H(xfc)] (xfc+i 


Xfe) ds. (70) 


Jo 

Computing norms of both sides of ( |70| ), considering the fact 
that norm of integral is smaller than integral of norm, and using 
Cauchy-Schwarz inequality lead to 


||V/(xfc+i) - V/(xfc) - H(xfc)(xfe+i - Xfc)|| < (71) 

/ ||H(sXfc+i + (1 - s)xfc) - H(xfc)|| ||xfc+i - Xfellds. 
do 

Lipschitz continuity of the Hessian as in implies that 

||H(sXfc+i + (1 - s)xfc) - H(xfc)|| < sL||xfc+i - Xfc||. By sub¬ 
stituting this upper bound into the integral in ( |7T] i and substituting 
the left hand side of ( [TT] ) by we obtain 

||gDQM|| ^ f - Xfcll^ds. (72) 

Jo 

Simplification of the integral in ( |72l l follows 

< ^llxfc+i-Xfcf. (73) 

The results in ([bS]) and (|7^ follow the claim in (|22]i. 


Appendix E 
Proof of Lemma[3] 

In this section we first introduce an equivalent version of 
Lemma for the DEM algorithm. Then, we show the validity 
of both lemmata in a general proof. 


The error vector norm ||e^^^|| in ( |34l l is bounded above as 
|jgDLM|| < |jv/(xfc+i) - V/(xfc)|| + p||xfe+i - Xfc||. (66) 

By substituting the upper bound for ||V/(xfe_|_i) — V/(xfc)|| in 
( |65| l into ( |66| ), the claim in ( [T5] l follows. 

To prove ( |22l i, first we show that < 2M||xfe_|_i —Xfe|| 

holds. Observe that the norm of error vector defined (|2T| 

can be upper bounded using the triangle inequality as 

||ef‘3“|| < ||V/(xfc+i) - V/(xfc)|| -f ||Hfc(xfc+i - Xfc)||. (67) 

Based on the Cauchy-Schwarz inequality and the upper bound M 
for the eigenvalues of Hessians as in ( [T9] l, we obtain ||Hfe(xfc_|_i — 
Xfe)|| < M||xfc_|_i — Xf;||. Eurther, as mentioned in ( |65] l the 
difference of gradients || V/(xfc_|_i) — V/(xfe)|| is upper bounded 
by M||xfc+i —Xfe||. Substituting these upper bounds for the terms 
in the right hand side of yields 

<2M||xfc+i-Xfc||. (68) 

The next step is to show that < (L/2)||xfe_|_i — x^jp. 

Adding and subtracting the integral H(xk)(xk+i — x^) ds to 


Lemma 4 Consider DLM as defined by Q-(|U- If Assumption^ 
holds true, then the optimal arguments x*, z*, and a* satisfy 

V/(xfe+i) - V/(x*) + (afc+i - a*) 

-cE^ (zfc - Zfc+i) = 0, (74) 
2(afc+i - afc) - cEo(xfe+i - X*) = 0, (75) 
E„(xfc-x*)-2(zfe-z*) = 0. (76) 

Notice that the claims in Lemmata and are identical except 
in the error term of the first equalities. To provide a general 
framework to prove the claim in these lemmata we introduce 
as the general error vector. By replacing with we obtain 

the result of DQM in Lemma ^ and by setting Ofc = the 

result in Lemma follows. We start with the following Lemma 
that captures the KKT conditions of optimization problem Q. 

Lemma 5 Consider the optimization problem (0- The optimal 
Lagrange multiplier a*, primal variable x* and auxiliary variable 
z* satisfy the following system of equations 

V/(x*) + E^a* = 0, EoX* = 0, E,x* = 2zT (77) 
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Proof: First observe that the KKT conditions of the decentralized 
optimization problem in Q are given by 

V/(x*) + A^A* = 0, B^A* = 0, Ax* + Bz* = 0. (78) 

Based on the definitions of the matrix B = [—Imp;—Imp] and 
the optimal Lagrange multiplier A* := [q;*;/ 3*], we obtain that 
B^A* = 0 in ( fTSl l is equivalent to a* = —f3*. Considering this 
result and the definition A = [A^; A^j, we obtain 

A^A*=Afa*+Aj/3* = (A,-Arf)^a*. (79) 

The definition Eq := As — Ad implies that the right hand side of 
( f79l l can be simplified as Eja* which shows A^A* = E^a*. 
Substituting A^A* by Eja* into the first equality in ( fTS] ) follows 
the first claim in ( |77| ). 

Decompose the KKT condition Ax* + Bz* = 0 in ( [78] l based 
on the definitions of A and B as 

A^x* - z = 0, AdX* - z = 0. (80) 

Subtracting the equalities in ( |80| ) implies that (A^ — Ad)x* = 0 
which by considering the definition Eq = Ag — Ad, the second 
equation in ( [77] ) follows. Summing up the equalities in ( [80l l yields 
(Ag + A(i)x* = 2z. This observation in association with the 
definition E„ = Ag — Ad follows the third equation in ( fTTj l. ■ 

Proofs of Lemmata |3] and |4j First note that the results in 
Lemma [T] are also valid for DLM Now, consider the first 
order optimality condition for primal updates of DQM and DLM 
in and ( [Toll , respectively. Further, recall the definitions of 
error vectors and in ( [^ and ( |^ , respectively. 

Combining these observations we obtain that 

V/(xfc+i) + Ofc + A^Afc + cA^ (Axfc+i + Bzfc) = 0. (81) 

Notice that by setting we obtain the update for 

primal variable of DQM; likewise, setting yields to 

the update of DLM. 

Observe that the relation A^ = A^+i — c(Axfc+i + Bz^+i) 
holds for both DLM and DQM according to to the update formula 
for Lagrange multiplier in (j^ and ( [T3] l. Substituting A^ by A^+i- 
c(Axj;_|_i + Bzfc_|_i) in ( [STj l follows 

V./(xfe+i) + Ofc + A^Afc+i + cA'^B (zfe - Zfc+i) = 0 (82) 

Based on the result in Lemmathe components of the Lagrange 
multiplier A = [a;f3] satisfy ctfe+i = Hence, the product 

A^Afe+i can be simplified as A'^otk+i - A’^otk+i = E^afc+i 
considering the definition that Eq = Ag — Ad- Furthermore, note 
that according to the definitions we have that A = [A^; A^^j and 
B = [—1; —I] which implies that A^B = —(Ag+A^)^ = —Ej. 
By making these substitutions into ( [8^ we can write 

V/(xfe+i) + ek + Eq afc+i - cE^ (zfc - Zk+i) = 0. (83) 

The first result in Lemma is equivalent to V/(x*) + Eq a* = 0. 
Subtracting both sides of this equation from the relation in ( |8^ 
follows the first claim of Lemmata [3 and |4j 

We proceed to prove the second and third claims in Lemmata]^ 
and 12 The update formula for in ( |45| ) and the second result in 
Lemmal^that EqX* = 0 imply that the second claim of Lemmata 
[2and[2^e valid. Further, the result in Lemma [^guaranteaes that 
EijXfe = 2zfc. This result in conjunction with the result in Lemma 
12 that E„x* = 2z* leads to the third claim of Lemmata [2 and |2 


Appendix F 

Proofs of Theorems [T] and |2] 

To prove Theorems and |2 we show a sufficient condition 
for the claims in these theorems. Then, we prove these theorems 
by showing validity of the sufficient condition. To do so, we use 
the general coefficient Pk which is equivalent to (jfe in the DQM 
algorithm and equivalent to p + M in the DLM method. These 
definitions and the results in Propositions |2 and [2 imply that 

||efe|| < ^fe||xfc+i - Xfcii, (84) 

where is in DQM and in DLM. The sufficient 

condition of Theorems [2 and |2 is studied in the following lemma. 


Lemma 6 Consider the DLM and DQM algorithms as defined 
in 0-0 and ([T2l-([T2), respectively. Further, conducer Sk as a 
sequence of positive scalars. If Assumptions m hold true then 


the sequence ||ufc — converges linearly as 


lufc+i-u*llc < 


1 


1 + <Tfc 


lu/c - U*||c, 


(85) 


if the following inequality holds true, 

r 

/3/c||xfc+i-x*||||xfe+i-Xfc||+4c||zfc+i-z*|p + —||afc+i-a*|p 

c 

< TO||xfc+i-x*|p + c||zfc+i-Zfc|p + -\\oLk+i-akV. (86) 

c 


Proof: Proving linear convergence of the sequence ||ufc — u*||q 
as mentioned in ( [85l l is equivalent to showing that 

4||ufc+i - u*||c < Ilufe - u*||c - ||ufc+i - u*||c. (87) 


According to the definition ||a||^ := a^Ca we can show that 

2 (ufc - Ufc+i)^C(ufc+i - u*) = ||ufe - u*||c - ||ufc+i - u*||c 

- ||ufe - Ufe+illc- (88) 


The relation in ( |88| ) shows that the right hand side of ( |87| ) can be 
substituted by 2{uk - Uk+iVC{uk+i - u*) + ||ufe - Ufe+i|l^. 
Applying this substitution into ( [87l l leads to 

4llufe+i-u*||c < 2(ufc-Ufc+i)^C(ufc+i-u*) + ||ufe-Ufc+i||c 

(89) 

This observation implies that to prove the linear convergence as 
claimed in ( [85| ), the inequality in ( [82 > should be satisfied. 

We proceed by finding a lower bound for the term 2(ufe — 
Ufc+i)^C(ufc_|_i — u*) in ( |89] )- By regrouping the terms in ( |82 ) 
and multiplying both sides of equality by (x^+i — x*)^ from 
the left hand side we obtain that the inner product (x^+i — 
x*)^(V/(xfc_|_i) — V/(x*)) is equivalent to 

(Xfe+i - x*)^(V/(xfc+i) - V/(x*)) = 

- (Xfc+i - x*)^efc - (Xfe+i - x*)^Eo (afe+i - a*) 

+ c(xfc+i - x*)^E^(zfc - Zfc+i). (90) 


Based on ( |25| ), we can substitute (x^+i — x*)^Eq — a*) 

in ( [90l l by (2/c)(Q;fc+i — a.i~V{ctk+i — a*). Further, the result 
in ( |22 ) implies that the term c(xfc+i — x*)^Ej (z^ — Zk+i) in 
( |90| ) is equivalent to 2c (z^ — z^+i)^ (z^+i — z*). Applying these 
substitutions into (|90| leads to 

(xfc+i-x*)^(V/(xfc+i) - V/(x*)) = -(xfe+i-x*)'^efc (91) 

2 j' 

H— {o^k — OLk+i)'^{(^k-\-i — a*) + 2c(zfc — Zfc+i) (zfc+i — z*). 
c 
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Based on the definitions of matrix C and vector u in ( |28] l, the last 
two summands in the right hand side of can be simplified as 

2 j' 

-{oLk — O-k+iY’ {oLk+i — a*) + 2c (zfc — Zfe+i) (zfe+i — z*) 
c 

= 2(ufe - Ufc+i)^C(ufc+i - u*). (92) 

Considering the simplification in we can rewrite ( |M] l as 

(xfc+i-x*)^(V/(xfc+i)-V/(x*)) (93) 

= -(xfe+i - x*)^efc + 2(ufc - Ufe+i)^C(ufc+i - u*). 

Observe that the objective function / is strongly convex with 
constant m which implies the inequality m||xfc+i — x*f < 
(x/j_|_i — x*)^(V/(x/j_|_i) — V/(x*)) holds true. Considering this 
inequality from the strong convexity of objective function / and 
the simplification for the inner product (x^+i—x*)^(V/(xfe_|_i) — 
V/(x*)) in the following inequality holds 

m||xfc+i-x*|p+(xfc+i-x*)^efc < 2(ufc-Ufc+i)^C(ufc+i-u*). 

(94) 

Substituting the lower bound for the term 2(ufc — 
Ufc+i)^C(ufc+i — u*) in ( |94l l into ( [89l l, it follows that the 
following condition is sufficient to have ( [85| l, 

4|!ufc+i - u*||c < m\\yLk+i - + (x^+i - x*)^efc 

+ ||ufc - Ufc+illc. (95) 

We emphasize that inequality ( [95| ) implies the linear convergence 
result in ( |85] l. Therefore, our goal is to show that if ( [86] l holds, 
the relation in ( |95| l is also valid and consequently the result in 
( |85| l holds. According to the definitions of matrix C and vector 
u in ( |28| l, we can substitute ||ufc_|_i — u*||q by c||zfc_|_i — z*|p + 
{l/c)\\oLk+i - q:*P and ||ufc - Ufc+i||^ by c||zfc+i - z^p + 
{l/c)\\<y.k+i — CKfep. Making these substitutions into ( |95l l yields 

4c||zfc+i - z*|p + —||afc+i - q:*|P < m||xfc+i - x*f (96) 
c 

+ (xfc+1 - x*)'^efc + c||zfe+i - Zfclp + -|lafe+i - OLkW^■ 

c 

The inequality in ( |84| implies that — ||efc|j is lower bounded by 
—/3fc||xfe+i — Xfc||. This lower bound in conjunction with the fact 
that inner product of two vectors is not smaller than the negative 
of their norms product leads to 

(xfc+i - x*)^efc > -/3fc|]xfe+i - x*||||xfc+i - Xfc||. (97) 

Substituting (x/^+i — x*)'^ek in ( |9^ by its lower bound in ( |97| ) 
leads to a sufficient condition for as in < [86] >, i.e., 

/3fc||xfe+i-x*||||xfe+i-xfc||+4c||zfc+i-z*|p + —||afc+i-a*|p 

c 

< m||xfc+i-x*|p + c||zfc+i-Zfc|p + -Wak+i-OLkW^- (98) 

c 

Observe that if ( [98] l holds true, then ( |96] l and its equivalence ( |95| ) 
are valid and as a result the inequality in ( [85l l is also satisfied. ■ 

According to the result in Lemma ^ the sequence ||ufe — u*|p 
converges linearly as mentioned in ( (85] l if the inequality in ( |86l ) 
holds true. Therefore, in the following proof we show that for 

c ^ ■ f (m- l)(c7n-Vk/3k)7o _ TO - 4/pfc ] 

" + ^Pl/c(n' - 1)) ’ cry4 + f ’ 

(99) 

the inequality in ([86ll holds and consequently (|85|) is valid. 


Proofs of Theorems [I] and 1^ we show that if the constant 4 is 
chosen as in then the inequality in ( |8^ holds true. To do this 
first we should find an upper bound for /3fcllxfe+i — x*|||jxfc+i — 
Xfell regarding the terms in the right hand side of ( |86| ). Observing 
the result of Lemma [T] that E„x/j = 2zfc for times k and k + 1, 
we can write 


Xfc) - 2(zfc^l Z/j;). (100) 


The singular values of E„ are bounded below by 7„. Hence, 
equation ( |100| i implies that ||xfc+i — Xfc|| is upper bounded by 

2 

llxfc+i - Xfell < — ||zfc+i - Zfc||. (101) 

lu 

Multiplying both sides of ( |101| l by 4llxfe+i — x* 

/3fc||xfc+i -x*|||lxfe+i -Xfcll < 


yields 


lu 


|Xfc+i - X 


I II ■ 

( 102 ) 

Notice that for any vectors a and b and positive constant ijk > 0 
the inequality 2|ja||||b|| < (l/77/j)||a|p + pfeUbp holds true. By 
setting a = x^+i — x* and b = (l/7^)(zfc+i —Zk) the inequality 
2||a||||b|| < (l/?7fc)||a|p + 77fc||b|p is equivalent to 


|Xfc+i-X 


|Zfe+i-Zfc|| < —||Xfc+i-X* 

Vk 


Vk I 

’o/2 I 


Substituting the upper bound for {2/^u)\\^k-\-i 
in dTO^ into ( [T02l i yields 


4||Xfc+i-X* 


|Xfc+i-Xfc|| < —||Xfe+i-X* 
Vk 


^2^V_kh\ 

ll 


Zfc+1—Zfcll . 

(103) 

li Zfc-i-l Zk 11 

Zfc+I-Zfcf 

(104) 

Notice that inequality ( |104| l provides an upper bound for 
4||xfc+i-x*|| |jxfc+i-Xfe|| in (|8^ regarding the terms in the right 
hand side of inequality which are ||xfc_|_i—x*|p and ||zfc+i —z^jp. 
The next step is to find upper bounds for the other two terms in the 
left hand side of ( [8^ regarding the terms in the right hand side of 
( |86l ) which are ||xfc+i - x*||2, ||zfc+i - ZfcH^, and \\ak+i - ak\\^. 
First we start with ||zfe+i — z*||^. The relation in ( |26l l and the 
upper bound r„ for the singular values of matrix E„ yield 

4cr2 


4c||zfc+i - z*|p < 


||Xfc+i - X 


(105) 


The next step is to bound (4/c)llcKfc+i ~ ci*|| in terms of the 
term in the right hand side of (47). First, note that for any vector 
a, b, and c, and constants ^ and fj! which are larger than 1, i.e. 
/r, /r' > 1, we can write 

(1 - 1)(1 - i)||cf < lla + b + cf + (p' - l)|laf 

fl fl 

+ (/r-l)(l-l)||bf. (106) 

Set a = cEj(zfe - Zk+i), b = V/(x*) - V/(xfc+i), and 
c = E^(q;* — a.k+i)- By choosing these values and observing 
equality ( |24l i we obtain a + b + c = . Hence, by making these 

substitutions for a, b, c, and a + b + c into ( |106| l we can write 

(1 - 1)(1 - ^)|lEjK+i - < llefcf (107) 

+ {n'-l)\\cEl{zk-Zk+i)r 

+ (m - 1)(1 - ^)||V/(xfc+i) - V/(x*)f. 

r 
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Notice that according to the result in Lemma [T] the Lagrange 
multiplier lies in the column space of for all fc > 0. Further, 
recall that the optimal multiplier a* also lies in the column space 
of Eq. These observations show that a* — otk is in the column 
space of Eq. Hence, there exits a vector r G such that a.* — 
OLk = EqU. This relation implies that ||Ej(Q:fe+i — Q:*)|p can 
be written as ||Eq Eor|p = r'^(EQ Eo)^r. Observe that since the 
eigenvalues of matrix (E^Eq)^ are the squared of eigenvalues 
of the matrix EjE q, we can write r^(Eo Eo)^r > 
where 70 is the smallest non-zero singular value of the oriented 
incidence matrix Eq. Observing this inequality and the definition 
a* — a.k = EqU we can write 

llE^aic+i - > 7oll«fc+i " • (108) 


Observe that the error norm ||efc|| is bounded above by Pk\\^k+i — 
Xfell as in ( [84l l and the norm ||cE^(zfc — Zk+i)\\'^ is upper 
bounded by c^r^||zfc — z^+ip since all the singular values of 
the unoriented matrix E„ are smaller than r„. Substituting these 
upper bounds and the lower bound in ( |108| l into ( |107| i implies 

(1 - ^)(1 - -)7oll«fc+i - < PkW^k+i - Xfef (109) 

+ {fi' - l)c2r^||zfc -Zfc+if+ (p- 1)(1 - ^)M2||xfe+i-x*f 

Considering the result in ( |101| i, ||xfc+i — x^jj is upper by 
(2/7„)||zfc+i —Zfc|j. Therefore, we can substitute ||xfe+i —Xfc|| in 
the right hand side of ( |109| ) by its upper bound (2/7„)||zfc_|_i—z^jj. 
Making this substimtion, dividing both sides by (1 — — 

1 /m) 7 o’ ^nd regrouping the terms lead to 


lafe+i-alr < 


+ 


II 

Xfc+i -X 




, 2 -r 2 1 


727o(f-1)(m'- 1 ) (m-1)7o 


|Zfc 


( 110 ) 

z,+if. 


Considering the upper bounds for Pk\\^k+i — x*||||xfc+i — Xfc||, 
||zfc+i - z*p, and ||afc+i - ctfep, in ( |l()4| l, ( |105| l, and ( | 11 Oi l, 
respectively, we obtain that if the inequality 


Pk 

.Vk 


hcTl Sk^kM'^ 
4 C 72 

ASkHn'Pl 


.c727o(Ai-l)(fi'-l) 


||Xfc+i - X 

Skfkfk'cTl 
(fi - l)7o 


( 111 ) 


rikPk 

ll 




1 , 


<m||xfe+i-x|| + c||zfc+i - Zfell +-\\ak+i - otkW 


holds true, ( [ 86 I 1 is satisfied. Hence, the last step is to show that 
for the specific choice of Sk in (|99| the result in o is satisfied. 
In order to make sure that ( |lll| i holds, it is sufficient to show 
that the coefficients of ||xfc+i — x*|p and ||zfc+i — z^jp in the 
left hand side of ra are smaller than the ones in the right hand 
side. Hence, we should verify the validity of inequalities 


/3fc 

Vk 

A6ktkfk'/3l 


SkcTl 5k^iM^ ^ 
^ 

4 C7^ 

Sk^k^i'cTl 


rikl3k 

cillliik - l)(/r' - 1 ) ' (m - l)7o ' ll 


< c. 


( 112 ) 


(113) 


Considering the inequality for 5k in ( |99l l we obtain that ( |112| l and 
( |113| ) are satisfied. Hence, if 5k satisfies condition in ( |99| ), ( |1 1 1| ) 
and consequently ( | 86 | ) are satisfied. Now recalling the result of 
Lemma that inequality ( [ 86 l l is a sufficient condition for the 


linear convergence in ( |85] l, we obtain that the linear convergence 
holds. By setting Pk = Cfc we obtain the linear convergence of 
DQM in Theorem [T is valid and the linear coefficient in ( |99| ) 
can be simplified as (3^. Moreover, setting Pk = P + M follows 
the linear convergence of DLM as in Theorem with the linear 
constant in 
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